-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recipes for open vocabulary keyword spotting #1428
Conversation
Hello, what's the current progress of this PR? Thanks! |
Developing the runtime first, see k2-fsa/sherpa-onnx#505 , will clean up this PR soon. |
Would it be possible to implement a KWS system using the output from the CTC branch, transforming it into a lattice to utilize Kaldi's decoders? Similar to what is done with kaldi-decoder/faster-decoder.h and kaldi-decoder/decodable-ctc.h? What part of the kaldi code must be implemented in kaldi-decoder in order to achieve that could you give me some direction? |
@alucassch I indeed have the plan to use ctc branch, but I think I won't use the kaldi decoders. As for using the kaldi decoders, you can compile the keywords into a lattice, than decode the audios with this lattice (faster decoder is enough, I think), then for each frame (or chunk) you can match the suffix of decoded results with keywords candidates, if matching and the logprob is larger than given threshold the corresponding keyword is triggered. Sorry, I don't have much experience in this direction, so here is just my thought, you can try it yourself. |
Here are some results of this PR, you can find more details in the RESULTS.md of each recipe. EnglishThe positive set is from https://github.com/pkufool/open-commands the negative set is the test set of gigaspeech. Each metric has two columns, one for original model trained on gigaspeech, the other for finetune model. small
large
ChineseThe positive set is from https://github.com/pkufool/open-commands the negative set is the test-net set of wenetspeech. Each metric has two columns, one for original model trained on wenetspeech, the other for finetune model. small
large and others
|
Are these numbers the result of extensive search, or chosen with some intuition? Thanks! |
No, just with some intuition. We are searching for better models also smaller models. |
您好,请问有提供kws微调前的预训练pt模型吗?只看到了onnx的。 |
看 results.md, 里面有链接。 |
@KIM7AZEN Thanks! Could you make a PR to fix it. |
ok. wait a moment |
What do 'small' and 'large and others' mean here? Are they referring to the size of the models or the sizes of different test sets? Why does the larger one seem to perform worse than the smaller one? |
@zhuangweiji Test sets, you can see https://github.com/k2-fsa/icefall/blob/master/egs/wenetspeech/KWS/RESULTS.md for more details. |
This is a initial version of decoder for open vocabulary keyword spotting system, the idea is almost the same as the context biasing system we proposed before, I improve the
ContextGraph
to make users can trade offrecall
andprecision
easily.I also trained some small zipformer models (around 3M parameters) on gigaspeech (for English) and wenetspeech (for Chinese) for keyword spotting purpose, will update the results and models in the following commits soon.